Quasiperiodic Biosequences and Modulo Incidence Matrices

نویسندگان

  • Honghui Wan
  • Enmin Song
چکیده

Algorithm development for finding quasiperiodic regions in sequences is at the core of many problems arising in biological sequence analysis. We solve an important problem in this area. Let A be an alphabet of size n and A’ denote the set of sequences of length 1 over A. Given a sequence S = ~1.52 . . .sl E A’, a positive integer p is called a period of S if s; = s;+~ for 1 5 i 5 1 p. S is called p-periodic if it has a minimum period p. Let n,(p) denote the set of p-periodic sequences in A I. A natural measure of “nearness to p-periodicity” for S is the average Hamming distance to the nearest p-periodic sequence: D(S) = minTEal(plD(S,T). If T is a sequence E n,(p) such that D(S,T) = D(S), then T is called a nearest p-periodic sequence of S and S is called pquasiperiodic associated with the score D(S). This paper develops an efficient algorithm for finding a nearest p-periodic sequence of S by means of its modulop incidence matrix. Let c\/ = (crr;..,c\/,) and /? = (q+ l;..,q+l 4 , ” ,>,>.$ where 1 = CV~ + CV~ + . . . + CV, is a partition of 1 and 4 is the quotientPaLd r is the remainder when 1 is divided by p. This paper shows that there exists a sequence in A’ whose modulo-p incidence matrix has row sum vector c\/ and column sum vector 0.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Pattern Discovery in Biosequences Using Aligned Pattern Clustering

Protein, RNA and DNA are made up of sequences of amino acids/nucleotides, which interact among themselves via binding. For example, (1) protein-DNA binding regulates gene transcription [1]; and (2) Protein-protein binding plays important roles in cell cycle control and signal transduction [2].The binding is maintained by either the direct participation or assistance of conserved short segments ...

متن کامل

Neweyes: A System for Comparing Biological Sequences Using the Running Karp-Rabin Greedy String-Tiling Algorithm1

A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm, Running KarpRabin Greedy String Tiling (RKR-GST), which involves tiling one string with matching substrings of a second string. In practice, RKR-GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect tr...

متن کامل

Running Karp-Rabin Matching and Greedy String Tiling

A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm, Running Karp-Rabin Greedy String Tiling (RKR-GST), which involves tiling one string with matching substrings of a second string. In practice, RKR-GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect t...

متن کامل

Neweyes: A System for Comparing Biological Sequences Using the Running Karp-Rabin Greedy String-Tiling Algorithm

A system for aligning nucleotide or amino acid biosequences is described. The system, called Neweyes, employs a novel string matching algorithm. Running Karp-Rabin Greedy String Tiling (RKR-GST), which involves tiling one string with matching substrings of a second string. In practice, RKR-GST has a computational complexity that appears close to linear. With RKR-GST, Neweyes is able to detect t...

متن کامل

On Hadamard Modulo Prime p Matrices of Size at most 2 p + 1 1

In this note, we continue the study of Hadamard Modulo Prime (HMP) matrices initialized in recent articles [5] – [6]. Namely, we have present some new non-existence and classification results for HMP matrices whose size is relatively small with respect to the modulo.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002